16 research outputs found

    Optimal Rates for the Random Fourier Feature Method

    Get PDF
    Kernel methods represent one of the most powerful tools in machine learning to tackle problems expressed in terms of function values and derivatives. While these methods show good versatility, they are computationally intensive and have poor scalability to large data as they require operations on Gram matrices. In order to mitigate this serious computational limitation, recently randomized methods have been proposed in the literature, which allow the application of fast linear algorithms. Random Fourier features (RFF) are among the most popular and widely applied constructions: they provide an easily computable, low-dimensional feature representation for shift-invariant kernels. Despite the popularity of RFFs, very little is understood theoretically about their approximation quality. In this talk, I am going to present the main ideas and results of a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) providing guarantees in Lr (1 <= r < \infty) norms. I will also propose an RFF approximation to derivatives of kernel with a theoretical study on its approximation quality

    Geometrical Insights for Implicit Generative Modeling

    Full text link
    Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy criterion. A careful look at the geometries induced by these distances on the space of probability measures reveals interesting differences. In particular, we can establish surprising approximate global convergence guarantees for the 11-Wasserstein distance,even when the parametric generator has a nonconvex parametrization.Comment: this version fixes a typo in a definitio

    ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks

    Full text link
    Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests, with near-optimal information-theoretic code aggregation among trees. We start with a simple hashing scheme, where random trees in a forest act as hashing functions by setting `1' for the visited tree leaf, and `0' for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees, rendering the random forests approach to hashing challenging. To address this, we propose to first randomly group arriving classes at each tree split node into two groups, obtaining a significantly simplified two-class classification problem, which can be handled using a light-weight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A non-conventional low-rank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intra-class variations and maximizing inter-class distance for the two random class groups. Finally, we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. The proposed approach significantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets, while performing at the level of other state-of-the-art image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201

    Domain Adaptation Transfer Learning by Kernel Representation Adaptation

    No full text
    International audienceDomain adaptation, where no labeled target data is available, is a challenging task. To solve this problem, we first propose a new SVM based approach with a supplementary MaximumMean Discrepancy (MMD)-like constraint. With this heuristic, source and target data are projected onto a common subspace of a Reproducing Kernel Hilbert Space (RKHS) where both data distributions are expected to become similar. Therefore, a classifier trained on source data might perform well on target data, if the conditional probabilities of labels are similar for source and target data, which is the main assumption of this paper. We demonstrate that adding this constraint does not change the quadratic nature of the optimization problem, so we can use common quadratic optimization tools. Secondly, using the same idea that rendering source and target data similar might ensure efficient transfer learning, and with the same assumption, a Kernel Principal Component Analysis (KPCA) based transfer learning method is proposed. Different from the first heuristic, this second method ensures other higher order moments to be aligned in the RKHS, which leads to better performances. Here again, we select MMD as the similarity measure. Then, a linear transformation is also applied to further improve the alignment between source and target data. We finally compare both methods with other transfer learning methods from the literature to show their efficiency on synthetic and real datasets
    corecore